Automated Price Comparison Shopping Search Engine
نویسندگان
چکیده
In this paper, we explore the possibility of creating a product search engine that is able to dynamically find commercial sites, independent of merchant feeds and other human involvement in the management of internal databases. We evaluate briefly the constraints of current shopping search engines and the benefits of offering a fully automated version. In addition, we consider the application of JTidy, Stemmers and Wrappers, in order to extract the relevant information from a commercial website. Introduction In the past, the Internet could be thought of as just a repository of static information, and search engines merely offered Internet users basic information retrieval. However, as the web evolved to become a bustling marketplace where online transactions are the norm, there is a need for more specific search capabilities. Ultimately, it is hoped that the ideal search engine reduces searching costs in terms of time and money for consumers in a perfectly efficient market. Currently, there already exist numerous shopping search engines (Sullivan, 2003), but they are mostly constrained by an essentially static database of available products. PriceScan lists products from a manually updated database, classified under static categories. Kelkoo and Yahoo! Shopping utilize similar database frameworks, where merchants submit their products to be classified manually by the search companies according to a predetermined structure. Amazon is a distributor, which sells a wide range of products, but in reality maintains a finite database of either products they have in their inventory or registered re-sale products. One search engine drew our attention because it seems to use a more dynamic approach in searching. Froogle, led us to believe that a fully automated search engine was possible, because we initially thought that it scours the web for relevant products on sale, instead of utilizing a static database. After closer investigations, we discovered that it also relies on merchant feeds, like Yahoo! Shopping, and offers free listing of products. Moreover, there are also other shortcomings of Froogle that could be improved on (Mills, 2003). In order to search over a database that is as dynamic as the growth of the Internet, we need to be able construct such a database directly from web content. Hence, we divided the process into 5 main steps: 1) exploring the web, 2) deciding relevancy of sites, 3) information extraction, 4) database management and 5) information retrieval. 1 http://www.pricescan.com/shoppingguides.asp 2 Any method of database management that involves a case-by-case human decision is considered manual, whether it is the merchant or the search company who makes the decision. CSE 401 Senior Design Project Elwin Chai & Rick Jones Given the extensive amount of research on the features of a search engine, there is already an established base of methods for crawling the web, database management, and information retrieval, which includes ranking a page based on a query (Lee et al, 1997). As such, the steps that require more research are the areas of site relevancy and information extraction. Some research has been done on the automatic classification of websites (Pierre, 2001), but has not concentrated on specifically commercial sites. Nonetheless, an important observation was made then, that when determining the relevancy of a web page, metadata provide critical information on top of just the plain content of the page. An example of metadata is whether a word is displayed as bold or in the title. Google captures some pieces of metadata in a bitmap format for every keyword (Brin and Page, 2001). Information extraction, on the other hand, has been explored through the use of wrappers (Kushmerick et al, 1997). There are even proposed toolkits to help construct a wrapper (Sahuguet and Azavant, 1999), which laid out the fundamentals of wrapper creation that helped us in our own practical implementation. It should be noted that even though there is general consensus as to the need for wrappers, the initially proposed wrappers seem far too site-specific and idealistic to be implemented in practice. Moreover, there is still debate over how effective and practical they can be (Kushmerick, 2000). Nevertheless, we sought to integrate concepts from wrappers to aid us in data extraction. The paper will first present the architecture with which we have chosen to build our search engine. Then we proceed with detailed explanation of our design choices in line with the steps of the workflow, as well as the challenges we faced. Finally we conclude with how future work can augment the effectiveness of this project. Architecture To layout the framework of the program, we track the flow of a single document (or web page) through the system. Diagram 1 Webcrawler In te rn et Frontier Manager Heuristics Manager Extract Information Stemmer Keyword Manager Database Manager Search
منابع مشابه
Competition In An Internet Mall: A Strategic Analysis of A New Marketing Venue
Retailers strive to differentiate themselves from competitors to avoid commoditization and consequent price competition, using tools such as store location, store layout, and product assortment. However, such tools become ineffective on the Internet, where competitors are a few clicks away, where web page design can be easily imitated, and where online shoppers can browse a variety of product c...
متن کاملProduct Information Retrieval on the Web: An Empirical Study
In this paper, we investigate the consumers’ perception of on-line product search using a questionnaire-based survey. We identify that the information retrieval activity of the purchase process can be performed with three Web applications: a search engine, a price comparison service, and a Web shop. The study underlines the need for linked product data as proposed by the Semantic Web. We argue ...
متن کاملA Simple Model of Search Engine Pricing*
We present a simple model of how a monopolistic search engine optimally determines the average relevance of firms in its search pool. In our model, there is a continuum of consumers, who use the search engine’s pool, and there is a continuum of firms, whose entry to the pool is restricted by a price-per-click set by the search engine. We show that a monopolistic search engine may have an incent...
متن کاملOnline User Behavioural Modeling with Applications to Price Steering
Price steering is the practice of “changing the order of search results to highlight specific products” and products prices. In this paper, we show an initial investigation to quantify the price steering level in search results shown to different kind of users on Google Shopping. We mimic the category of affluent users. Affluent users visit websites offering expensive services, search for luxur...
متن کاملShort- and Long-term Effects of Online Advertising: Differences between New and Existing Customers
Online advertising has shortand long-term effects. Little is known, however, about which online advertising channel works best to address particular customer groups when shortand long-term effects, are taken into account. We look at the sales effect of search engine marketing (SEM), banner advertising, price comparison advertising (PCA) and coupon/loyalty advertising (CLA) on new and existing c...
متن کاملQuery and Product Suggestion for Price Comparison Search Engines based on Query-product Click-through Bipartite Graphs
Query suggestion is a technique for generating alternative queries to facilitate information seeking, and has become a needful feature that commercial search engines provide to web users. In this paper, we focus on query suggestion for price comparison search engines. In this specific domain, suggestions provided to web users need to be properly generated taking into account whether both the se...
متن کامل